This notebook is part of a tutorial series for the FRETBursts burst analysis software.
In this notebook, show a few example of how to export FRETBursts burst data to a file.
We start by loading FRETBursts
:
In [ ]:
from fretbursts import *
In [ ]:
sns = init_notebook()
The full list of smFRET measurements used in the FRETBursts tutorials can be found on Figshare.
This is the file we will download:
In [ ]:
url = 'http://files.figshare.com/2182601/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5'
url
variable above to download your own data file.
This is useful if you are executing FRETBursts online and you want to use
your own data file. See
First Steps.
Here, we download the data file and put it in a folder named data
,
inside the notebook folder:
In [ ]:
download_file(url, save_dir='./data')
NOTE: If you modified the
url
variable providing an invalid URL the previous command will fail. In this case, edit the cell containing theurl
and re-try the download.
Here, we can directly define the file name to be loaded:
In [ ]:
filename = "./data/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5"
filename
Let's check that the file exists:
In [ ]:
import os
if os.path.isfile(filename):
print("Perfect, file found!")
else:
print("Sorry, file:\n%s not found" % filename)
In [ ]:
d = loader.photon_hdf5(filename)
In [ ]:
d.ph_times_t, d.det_t
We need to define some ALEX parameters:
In [ ]:
d.add(det_donor_accept = (0, 1),
alex_period = 4000,
offset = 700,
D_ON = (2180, 3900),
A_ON = (200, 1800))
Here the parameters are:
det_donor_accept
: donor and acceptor channelsalex_period
: length of excitation period (in timestamps units)D_ON
and A_ON
: donor and acceptor excitation windowsoffset
: the offset between the start of alternation and start of timestamping
(see also Definition of alternation periods).To check that the above parameters are correct, we need to plot the histogram of timestamps (modulo the alternation period) and superimpose the two excitation period definitions to it:
In [ ]:
bpl.plot_alternation_hist(d)
If the previous alternation histogram looks correct, the corresponding definitions of the excitation periods can be applied to the data using the following command:
In [ ]:
loader.alex_apply_period(d)
If the previous histogram does not look right, the parameters in the d.add(...)
cell can be modified and checked by running the histogram plot cell until everything looks fine. Don't forget to apply the
parameters with loader.usalex_apply_period(d)
as a last step.
NOTE: After applying the ALEX parameters a new array of timestamps containing only photons inside the excitation periods is created (name
d.ph_times_m
). To save memory, by default, the old timestamps array (d.ph_times_t
) is deleted. Therefore, in the following, when we talk about all-photon selection we always refer to all photons inside both excitation periods.
In [ ]:
d.calc_bg(bg.exp_fit, time_s=30, tail_min_us='auto', F_bg=1.7)
In [ ]:
d.burst_search(L=10, m=10, F=6)
First we filter the bursts to avoid creating big files:
In [ ]:
ds = d.select_bursts(select_bursts.size, th1=60)
By burst-data we mean all the scalar burst parameters, e.g. size, duration, background, etc...
We can easily get a table (a pandas DataFrame) with all the burst data as follows:
In [ ]:
bursts = bext.burst_data(ds, include_bg=True, include_ph_index=True)
bursts.head(5) # display first 5 bursts
Once we have the DataFrame, saving it to disk in CSV format is trivial:
In [ ]:
bursts.to_csv('%s_burst_data.csv' % filename.replace('.hdf5', ''))
Exporting timestamps and other photon-data for each bursts is a little trickier
because the data is less uniform (i.e. each burst has a different number of photons).
In the following example, we will save a csv
file with variable-length columns.
Each burst is represented by to lines: one line for timestamps and one line for the
photon-stream (excitation/emission period) the timestamps belongs to.
Let's start by creating an array of photon streams containing one of the values 0, 1, 2 or 3 for each timestamp. These values will correspond to the DexDem, DexAem, AexDem, AemAem photon streams respectively.
In [ ]:
ds.A_ex
In [ ]:
#{0: DexDem, 1:DexAem, 2: AexDem, 3: AemAem}
In [ ]:
(ds.A_ex[0].view('int8') << 1) + ds.A_em[0].view('int8')
Now we define an header documenting the file format. Ww will also include the filename of the measurement.
This is just an example including nanotimes:
In [ ]:
header = """\
# BPH2CSV: %s
# Lines per burst: 3
# - timestamps (int64): in 12.5 ns units
# - nanotimes (int16): in raw TCSPC unit (3.3ps)
# - stream (uint8): the photon stream according to the mapping {0: DexDem, 1: DexAem, 2: AexDem, 3: AemAem}
""" % filename
print(header)
And this is header we are going to use:
In [ ]:
header = """\
# BPH2CSV: %s
# Lines per burst: 2
# - timestamps (int64): in 12.5 ns units
# - stream (uint8): the photon stream according to the mapping {0: DexDem, 1: DexAem, 2: AexDem, 3: AemAem}
""" % filename
print(header)
We can now save the data to disk:
In [ ]:
out_fname = '%s_burst_timestamps.csv' % filename.replace('.hdf5', '')
dx = ds
ich = 0
bursts = dx.mburst[ich]
timestamps = dx.ph_times_m[ich]
stream = (dx.A_ex[ich].view('int8') << 1) + dx.A_em[ich].view('int8')
with open(out_fname, 'wt') as f:
f.write(header)
for times, period in zip(bl.iter_bursts_ph(timestamps, bursts),
bl.iter_bursts_ph(stream, bursts)):
times.tofile(f, sep=',')
f.write('\n')
period.tofile(f, sep=',')
f.write('\n')
Done!
In [ ]:
import pandas as pd
We start reading the header and computing some file-specific constants.
In [ ]:
with open(out_fname) as f:
lines = []
lines.append(f.readline())
while lines[-1].startswith('#'):
lines.append(f.readline())
header = ''.join(lines[:-1])
print(header)
In [ ]:
stream_map = {0: 'DexDem', 1: 'DexAem', 2: 'AexDem', 3: 'AemAem'}
nrows = int(header.split('\n')[1].split(':')[1].strip())
header_len = len(header.split('\n')) - 1
header_len, nrows
As a test, we load the data for the first burst into a dataframe, converting the numerical column "streams" into photon-stream names (strings). The new column is of type categorical, so it will take very little space:
In [ ]:
burstph = (pd.read_csv(out_fname, skiprows=header_len, nrows=nrows, header=None).T
.rename(columns={0: 'timestamp', 1: 'stream'}))
burstph.stream = (burstph.stream
.apply(lambda x: stream_map[pd.to_numeric(x)])
.astype('category', categories=['DexDem', 'DexAem', 'AexDem', 'AemAem'], ordered=True))
burstph
For reading the whole file I use a different approach. First, I load the entire file in two lists of lists (one for timestamps and one for the stream). Next, I create a single DataFrame with a third column indicating the burst index.
In [ ]:
import csv
from builtins import int # python 2 workaround, can be removed on python 3
# Read data in two list of lists
t_list, s_list = [], []
with open(out_fname) as f:
for i in range(header_len):
f.readline()
csvreader = csv.reader(f)
for row in csvreader:
t_list.append([int(v) for v in row])
s_list.append([int(v) for v in next(csvreader)])
# Turn the inner list into pandas.DataFrame
d_list = []
for ib, (t, s) in enumerate(zip(t_list, s_list)):
d_list.append(
pd.DataFrame({'timestamp': t, 'stream': s}, columns=['timestamp', 'stream'])
.assign(iburst=ib)
)
# Concatenate dataframes
burstph = pd.concat(d_list, ignore_index=True)
# Convert stream column into categorical
burstph.stream = (burstph.stream
.apply(lambda x: stream_map[pd.to_numeric(x)])
.astype('category', categories=['DexDem', 'DexAem', 'AexDem', 'AemAem'], ordered=True))
burstph
In [ ]:
burstph.dtypes
This was just an example. There are certainly more efficient ways to read the file into a DataFrame. Please feel free to contribute new methods to illustrate a different (more efficient or simpler) approach.